Zoe Warner is Senior Systems Administrator, Mediaflex at the National Film and Sound Archive of Australia. She attended IDCC22 with the support of the DPC Career Development Fund, which is funded by DPC Supporters.
I am grateful to the DPC in facilitating my first attendance at the annual International Digital Curation Conference (IDCC), held virtually from 13-16th June 2022. This was a conference that opened my eyes to the big world of data curation (particularly scientific research data) and the challenges therein. These challenges however are still somewhat familiar to my stomping ground at the NFSA (National Film and Sound Archive of Australia).
The topic for IDCC this year (IDCC22) was Reusability, setting off existential thoughtlets about the fundamental purpose of storing, well, anything really, but in this context, data in all its forms and flavours.
‘WHY do we do this?’ becomes inextricably linked to ‘for WHO do we do this?’, all driving the underlying needs defined by the FAIR principles of Findability, Accessibility, Interoperability and Reuse, a common touchpoint throughout the conference. Our path leading inevitably to the pragmatic stumbling block of HOW?
The conference began with a delightful virtual sojourn to Stonehenge with tour guide William Kilbride, of the DPC. The cultural icon’s distinctive monoliths still align towards the sunrise on the summer solstice, thousands of years after being placed. An undeniable testament to reuse despite the loss of context around its original purpose and those who progressively built it. A transcript of the opening keynote lecture is available here, read it and be transported! It was a wonderful beginning, a rallying cry to thoughtfully consider why re-use is so fundamental to our roles as data keepers.
44 presentations followed, beginning with parallel lightning talks to limber up attendees’ minds. Paper presentations then allowed more in-depth examination. The range of presenters and topics outlined the interplay between Preservation and Access in the pursuit of FAIR data, the importance of Community in both building these repositories and determining access limitations and the balance between human resourcing and automated tools in trying to address these challenges. We, of course, also come back to fundamentals; good data in now, with solid preservation and data management plans in place, go a long way in facilitating reusability later.
One example has stayed with me from Stream B on Communities – a lightning talk Reusability of long-term ecological monitoring data – a case study from Kilpisjärvi biological station, Finland, presented by Tanja Lindholm. She outlined the challenges of unique data sets collected over several decades by individual researchers. A rich source of information but never used, this lack of use has compounded issues over time, further diminishing its accessibility and re-use. There are inconsistencies in data description and method, meaning important metadata is missing, as well as inconsistent data entry and corrupted files. Remediating these issues is complicated by scattered data in varying folder structures with unclear ownership, with many data collectors having retired. The research group focused on one dataset to determine the method and resources required to achieve FAIR outcomes. Three and half months later, with 4 data science students, a data manager and a relevant data collector, they had cleaned one dataset, determining that to make the remaining 50 datasets reusable would take 14 and ½ years!
One of the challenges the NFSA faces is an imbalance between the amount of material (a digital tsunami) it acquires and the resources to process that material into the collection. It also brings to bear the potential limitations of processes and data models developed in a less digital age. Forefront in addressing these issues is WHY and FOR WHO. Although important, it is not enough to ensure the material is safely tucked away in secure storage, with checksums and preservation strategies in place. Also needed is contextual metadata and a catalogue database that is accessible, searchable, and provides a FAIR go for the community it is ultimately maintained for. Digital objects, unlike Stonehenge monoliths, existing on their own, with no digital curation or thought for re-use, are destined to become meaningless—and is why these closer considerations around re-use for curation and preservation are so fundamental to our work.